Clustering Demonstration

This notebook is originally from Dr. Chandola's version of 474/574

We will use k-means clustering to show how clustering works, though several other clustering methods exist. The key hyper-parameter associated with k-means is the value $k$ which specifies the number of clusters.

Simple k-Means Demo

We first generate some data with four known clusters

Stack all data together and select four random cluster centers

Assign points to the closest cluster and recompute centroids. Iterate until converged. Notice how quickly k-means converges regardless of the initial conditions. However, this might not always be true.

k-Means on Handwritten Digit Recognition Data

We can use k-means on the handwritten digits data set